Skip to main content

Kaplan Meier Survival Curves


Introduction to Kaplan-Meier Method

From the pervious chapter, we learned that the data layout that we typically use for survival analysis is given by the table shown here:

Ordered failure times (t(f))(t_{(f)})# of failures (mf)(m_f)# censored in [t(f),t(f1))(qf)[t_{(f)},t_{(f-1)}) (q_f)Risk set R (t(f))(t_{(f)})
t(0)=0t_{(0)} = 0m0=0m_0=0q0q_0R(t(0))R(t_{(0)})
t(1)t_{(1)}m1m_1q1q_1R(t(1))R(t_{(1)})
t(2)t_{(2)}m2m_2q2q_2R(t(2))R(t_{(2)})
....
....
....
t(k)t_{(k)}mkm_kqkq_kR(t(k))R(t_{(k)})

This layout is the basis upon which Kaplan-Meier survival curves are derived. The first column represents ordered survival times from smallest to largest. The second column represents frequency counts for failures at each distinct failure time. The third one represents frequency counts of those persons censored in the time interval from failure time t(f)t_{(f)} up to but not including the next failure time t(f+1)t_{(f+1)}. The last column gives the risk set, which denotes the collection of individuals who have survived at least to time t(f)t_{(f)}.

An Example of Kaplan-Meier Curves

Let's still take a look at the dataset from the last chapter:

We list these data as the KM table:

Group 1Group 2

Each table begins with a survival time of zero, even though no subject actually failed at the start of follow-up. The reason for the zero is to allow for the possibility that some subjects might have been censored before the earliest failure time.

We also have each table contain a column denoted as nfn_f that gives the number of subjects in the risk set at the start of the interval. nfn_f counts subjects at risk for failing instantaneously prior to time t(f)t_{(f)}.

Now let's talk about how to compute the KM curve for group 2.

For group 2, because we don't have any censored subjects, the computation of the KM curve is straightforward.

t(f)t_{(f)}nfn_fmfm_fqfq_fS^(t(f))\hat S(t_{(f)})
021001
1212019/21=0.90
2192017/21=0.81
3171016/21=0.76
4162014/21=0.67
5142012/21=0.57
812408/21=0.38
118206/21=0.29
126204/21=0.19
154103/21=0.14
173102/21=0.10
222101/21=0.05
231100/21=0.00

Here, S^(t(f))\hat S(t_{(f)}) is the survival probability at time t(f)t_{(f)}. The probability of surviving past the first ordered failure time of 11 week is given by 19/2119/21 or 0.900.90 because 22 people failed at 11 week, so that 1919 people from the original 2121 remain as survivors past 11 week. Similarly, the next probability concerns subjects surviving past 22 weeks, which is 17/2117/21 or 0.810.81 because 22 subjects failed at 11 week and 22 subjects failed at 22 weeks leaving 1717 out of the original 2121 subjects surviving past 22 weeks.

Recall that no subject in group 2 was censored, so the qq column for group 2 consists entirely of zeros. If some of the q's had been nonzero, an alternative formula for computing survival probabilities would be needed. This alternative formula is called the Kaplan-Meier (KM) approach and can be illustrated using the group 2 data even though all values of qq are zero.

For instance, an alternative way to calculate the survival probability of exceeding 44 weeks for the group 2 data can be written using the KM formula shown here. This formula involves the product of conditional probability terms. That is, each term in the product is the probability of exceeding a specific ordered failure time t(f)t_{(f)} given that a subject survives up to that failure time. We have:

S^(4)=11921171916171416=1421=0.67\hat{S}(4)=1\cdot\frac{19}{21}\cdot\frac{17}{19}\cdot\frac{16}{17}\cdot\frac{14}{16}=\frac{14}{21}=0.67

Thus, in the KM formula for survival past 44 weeks, the term 19/2119/21 gives the probability of surviving past the first ordered failure time, 11 week, given survival up to the first week. Note that all 2121 persons in group 2 survived up to 11 week, but that 22 failed at 11 week, leaving 1919 persons surviving past 11 week.

Similarly, the term 16/1716/17 gives the probability of surviving past the third ordered failure time at week 33, given survival up to week 33. There were 1717 persons who survived up to week 33 and 11 of these then failed, leaving 1616 survivors past week 33. Note that the 1717 persons in the denominator represents the number in the risk set at week 33.

Notice that the product terms in the KM formula for surviving past 44 weeks stop at the 4th week with the component 14/1614/16. Similarly, the KM formula for surviving past 88 weeks stops at the eighth week:

S^(8)=119211719161714161214812=821=0.38\hat{S}(8)=1\cdot\frac{19}{21}\cdot\frac{17}{19}\cdot\frac{16}{17}\cdot\frac{14}{16}\cdot\frac{12}{14}\cdot\frac{8}{12}=\frac{8}{21}=0.38

Generally speaking, KM formula for a survival probability is limited to product terms up to the survival week being specified. Thus, KM formula is often called "product-limit" formula.

Now let's consider the KM formula for group 1 data.

t(f)t_{(f)}nfn_fmfm_fqfq_fS^(t(f))\hat S(t_{(f)})
0210011
621311×1821=0.85711\times\frac{18}{21}=0.8571
717110.8571×1617=0.80670.8571\times\frac{16}{17}=0.8067
1015120.8067×1415=0.75290.8067\times\frac{14}{15}=0.7529
1312100.7529×1112=0.69020.7529\times\frac{11}{12}=0.6902
1611130.6902×1011=0.62750.6902\times\frac{10}{11}=0.6275
227400.6275×67=0.53780.6275\times\frac{6}{7}=0.5378
236250.5378×56=0.44820.5378\times\frac{5}{6}=0.4482

The other survival estimatesb are calculated by multiplying the estimate for the immediately preceding failure time by a fraction. For example, the fraction is 18/2118/21 for surviving past week 66, because 2121 subjects remain up to week 66 and 33 of these subjects fail to survive past week 66. The fraction is 16/1716/17 for surviving past week 77, because 1717 people remain up to week 77 and 11 of these fails to survive past week 77. The other fractions are calculated similarly.

Plots of the KM curves for groups 1 and 2 are shown here on the same graph. Notice that the KM curve for group 1 is consistently higher than the KM curve for group 2. These figures indicate that group 1, which is the treatment group, has better survival prognosis than group 2, the placebo group.

Note that we can obtain KM plots from R using the "survival" package.


General Features of KM Curves

General KM Formula

S^(t(f))=i=1fP^r[T>t(i)Tt(i)]=S^(t(f1))×P^(T>t(f)Tt(f))\begin{aligned} \hat{S}\left(t_{(f)}\right)= & \prod_{i=1}^f \hat{P} r\left[T>t_{(i)} \mid T \geq t_{(i)}\right] \\ = & \hat{S}\left(t_{(f-1)}\right) \times \hat{\operatorname{P}}\left(T>t_{(f)} \mid T \geq t_{(f)}\right) \end{aligned}

For example, the probability of surviving past 1010 weeks is given in the table for group 1 by .8067.8067 times 14/1514/15, which equals .7529.7529. But the .8067.8067 can be alternatively written as the product of the fractions 18/2118/21 and 16/1716/17. Thus, the product limit formula for surviving past 1010 weeks is given by the triple product shown here.

S^(10)=.8067×1415=.7529=1821×1617×1415\begin{aligned} \hat{S}(10) & =.8067 \times \frac{14}{15}=.7529 \\ & =\frac{18}{21} \times \frac{16}{17} \times \frac{14}{15} \end{aligned}